The following project aims to make an exploratory analysis on the historical data of the months of January to July of the year 2018 provided by the page of ecobici; in the same link you can find what you need to access the API provided by this service. The documentation on the API can be found in the following link.
Ecobici is a public bicycle service in Mexico City aimed at the inhabitants of the capital, its surroundings and tourists.
The operation of this service allows registered users to take a bicycle from any station and return it in the closest to its destination in unlimited journeys of 45 minutes.
The way to access this service is through an annual, weekly, three-day or daily subscription.
This service is available from 5:00 a.m. to 12:00 a.m. every day of the year, which starts in 2010, February with 84 “cicloestaciones” (“ciclostations”) and 1,200 bicycles.
Currently Ecobici has more than 170 thousand registered users and the service is available in 55 colonies in Mexico City, in an area of 38 square kilometers.
The historical data was downloaded and stored in a database called ECOBICI; the scripts to export this DB can be found in the files provided in this project in the data folder.
The DB ECOBICI contains 7 tables, one for each month, that is, one for each file downloaded directly from the ecobici page; each table contains the following fields:
The following code is used to access a MySQL database, however what we used was directly from Excel files and Data Frames construction. Why? Only for practicity for the purpose of this document.
# install.packages("readr")
# install.packages("ggplot2")
# install.packages("stringr")
# install.packages("dplyr")
# install.packages("knitr")
# install.packages("tidyr")
# install.packages("lubridate")
# install.packages("ggmap")
# install.packages("plotly")
# install.packages("tidyverse")
# install.packages("RMySQL")
# install.packages("pool")
# install.packages("DBI")
# install.packages("googleway")library(readr)
library(ggplot2)
library(stringr)
library(dplyr)
library(knitr) #To generate a cool table
library(tidyr) #For data manipulation
library(lubridate)
library(ggmap)
library(plotly)
library(tidyverse)
library(RMySQL)
library(pool)
library(DBI)
library(googleway)
library(httr)If we would have connected to the DB, then we would execute this code:
"[db.host <- 'localhost'
db.user <- 'root'
db.port <- 3306
db.password <- '$<your password>'
## DB Connection
db_connect <- function(db.name) {
db <- dbPool(
drv = RMySQL::MySQL(),
dbname = db.name,
host = db.host,
user = db.user,
password = db.password,
port = as.numeric(db.port)
)
return(db)
}]"## [1] "[db.host <- 'localhost'\ndb.user <- 'root'\ndb.port <- 3306\ndb.password <- '$<your password>'\n\n## DB Connection\ndb_connect <- function(db.name) {\n db <- dbPool(\n drv = RMySQL::MySQL(),\n dbname = db.name,\n host = db.host,\n user = db.user,\n password = db.password,\n port = as.numeric(db.port)\n )\n \n return(db)\n}]"
Then we would took converted each month of data into Data Frames.
"[
January <- tbl(db_connect('ECOBICI'), 'Enero') %>% collect()
February <- tbl(db_connect('ECOBICI'), Febrero') %>% collect()
March <- tbl(db_connect('ECOBICI'), Marzo) %>% collect()
April <- tbl(db_connect('ECOBICI'), Abril) %>% collect()
May <- tbl(db_connect('ECOBICI'), 'Mayo') %>% collect()
June <- tbl(db_connect('ECOBICI'), 'Junio') %>% collect()
July <- tbl(db_connect('ECOBICI'), 'Julio') %>% collect()
]"## [1] "[\nJanuary <- tbl(db_connect('ECOBICI'), 'Enero') %>% collect()\nFebruary <- tbl(db_connect('ECOBICI'), Febrero') %>% collect()\nMarch <- tbl(db_connect('ECOBICI'), Marzo) %>% collect()\nApril <- tbl(db_connect('ECOBICI'), Abril) %>% collect()\nMay <- tbl(db_connect('ECOBICI'), 'Mayo') %>% collect()\nJune <- tbl(db_connect('ECOBICI'), 'Junio') %>% collect()\nJuly <- tbl(db_connect('ECOBICI'), 'Julio') %>% collect()\n]"
But instead we read the information directly from “.cvs” files, this way:
January <- read_csv("data/2018-01.csv",col_names = T, na = c(""," ", "NA", "?"))
February <- read_csv("data/2018-02.csv",col_names = T, na = c(""," ", "NA", "?"))
March <- read_csv("data/2018-03.csv",col_names = T, na = c(""," ", "NA", "?"))
April <- read_csv("data/2018-04.csv",col_names = T, na = c(""," ", "NA", "?"))
May <- read_csv("data/2018-05.csv",col_names = T, na = c(""," ", "NA", "?"))
June <- read_csv("data/2018-06.csv",col_names = T, na = c(""," ", "NA", "?"))
July <- read_csv("data/2018-07.csv",col_names = T, na = c(""," ", "NA", "?"))
names(January) <- tolower(names(January))
names(February) <- tolower(names(February))
names(March) <- tolower(names(March))
names(April) <- tolower(names(April))
names(May) <- tolower(names(May))
names(June) <- tolower(names(June))
names(July) <- tolower(names(July))It’s important to see what type of values contain our data, to clean or addecuate values for what we need.
With the following function the data type will be modified in user_genre and the time and date data will be collapsed in a single data for the arrival times and the withdrawal times recorded.
gender_arr_fix <- function(data){
new_data <- data %>%
mutate(genero_usuario = factor(genero_usuario, levels = c("M", "F")),
hora_arribo = str_replace(hora_arribo, "(\\w+:\\w+:\\w+)\\r", "\\1"),
fecha_retiro = dmy(fecha_retiro),
hora_retiro = hms(hora_retiro),
fecha_arribo = dmy(fecha_arribo),
hora_arribo = hms(hora_arribo)
) %>%
mutate(re_datetime = make_datetime(year(fecha_retiro),
month(fecha_retiro),
day(fecha_retiro),
hour(hora_retiro),
minute(hora_retiro),
second(hora_retiro)),
arr_datetime = make_datetime(year(fecha_arribo),
month(fecha_arribo),
day(fecha_arribo),
hour(hora_arribo),
minute(hora_arribo),
second(hora_arribo))) %>%
select(-c(fecha_retiro, hora_retiro, fecha_arribo, hora_arribo))
return(new_data)
}Then through the previous function original Data Frames were modified.
January <- gender_arr_fix(January)
February <- gender_arr_fix(February)
March <- gender_arr_fix(March)
April <- gender_arr_fix(April)
May <- gender_arr_fix(May)
June <- gender_arr_fix(June)
July <- gender_arr_fix(July)… and got this new information:
To make a profiling of data, we used the DataProfiling module which is found in the documents of this project.
It should be noted that edad_usuario,id_bici, ciclo_estacion_retiro,ciclo_estacion_arribo are identifiers, so, even if they are numerical type, they do not provide any relevant quantitative information, therefore we only observed characteristics over data which do not include numerical summaries.
First a data count and then a summary of each month was made.
Jan_fac <- January %>% select(-c(re_datetime, arr_datetime))
Jan_date <- January %>% select(c(re_datetime, arr_datetime))A table with format specifications is built, that large numbers separate them with “,” and do not use scientific notation.
Jan_factor <- Jan_fac %>% profiling("categorical")
kable(Jan_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 67 | 0 | 28 |
| bici | 4,897 | 1 | 7376 |
| ciclo_estacion_retiro | 432 | 0 | 271 |
| ciclo_estacion_arribo | 435 | 0 | 1 |
For the time data it was only verified if there are null values and the count of the unique values.
Jan_datetime <- Jan_date %>% profiling("other")
kable(Jan_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 545,217 | 0 |
| arr_datetime | 545,660 | 0 |
There weren’t null values.
Feb_fac <- February %>% select(-c(re_datetime, arr_datetime))
Feb_date <- February %>% select(c(re_datetime, arr_datetime))
Feb_factor <- Feb_fac %>% profiling("categorical")
kable(Feb_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 73 | 0 | 28 |
| bici | 5,058 | 44 | 2019 |
| ciclo_estacion_retiro | 476 | 0 | 271 |
| ciclo_estacion_arribo | 476 | 0 | 43 |
Feb_datetime <- Feb_date %>% profiling("other")
kable(Feb_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 541,252 | 0 |
| arr_datetime | 541,095 | 0 |
Mar_fac <- March %>% select(-c(re_datetime, arr_datetime))
Mar_date <- March %>% select(c(re_datetime, arr_datetime))
Mar_factor <- Mar_fac %>% profiling("categorical")
kable(Mar_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 72 | 0 | 28 |
| bici | 4,931 | 122 | 2698 |
| ciclo_estacion_retiro | 476 | 0 | 271 |
| ciclo_estacion_arribo | 478 | 0 | 27 |
Mar_datetime <- Mar_date %>% profiling("other")
kable(Mar_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 580,412 | 0 |
| arr_datetime | 579,596 | 0 |
A_fac <- April %>% select(-c(re_datetime, arr_datetime))
A_date <- April %>% select(c(re_datetime, arr_datetime))
A_factor <- A_fac %>% profiling("categorical")
kable(A_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 71 | 0 | 28 |
| bici | 4,889 | 132 | 11065 |
| ciclo_estacion_retiro | 478 | 0 | 27 |
| ciclo_estacion_arribo | 478 | 0 | 27 |
A_datetime <- A_date %>% profiling("other")
kable(A_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 589,901 | 0 |
| arr_datetime | 589,669 | 0 |
May_fac <- May %>% select(-c(re_datetime, arr_datetime))
May_date <- May %>% select(c(re_datetime, arr_datetime))
May_factor <- May_fac %>% profiling("categorical")
kable(May_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 72 | 0 | 28 |
| bici | 4,982 | 0 | 15259 |
| ciclo_estacion_retiro | 480 | 0 | 271 |
| ciclo_estacion_arribo | 480 | 0 | 27 |
May_datetime <- May_date %>% profiling("other")
kable(May_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 618,668 | 0 |
| arr_datetime | 618,486 | 0 |
Jun_fac <- June %>% select(-c(re_datetime, arr_datetime))
Jun_date <- June %>% select(c(re_datetime, arr_datetime))
Jun_factor <- Jun_fac %>% profiling("categorical")
kable(Jun_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 71 | 0 | 28 |
| bici | 4,980 | 0 | 2789 |
| ciclo_estacion_retiro | 479 | 0 | 271 |
| ciclo_estacion_arribo | 479 | 0 | 27 |
Jun_datetime <- Jun_date %>% profiling("other")
kable(Jun_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 545,135 | 0 |
| arr_datetime | 545,241 | 0 |
July_fac <- July %>% select(-c(re_datetime, arr_datetime))
July_date <- July %>% select(c(re_datetime, arr_datetime))
July_factor <- July_fac %>% profiling("categorical")
kable(July_factor, format.args = list(big.mark=",", scientific=F))| uniques | nan | mode | |
|---|---|---|---|
| genero_usuario | 2 | 0 | M |
| edad_usuario | 70 | 0 | 28 |
| bici | 4,929 | 0 | 9581 |
| ciclo_estacion_retiro | 480 | 0 | 27 |
| ciclo_estacion_arribo | 480 | 0 | 27 |
July_datetime <- July_date %>% profiling("other")
kable(July_datetime, format.args = list(big.mark=",", scientific=F))| uniques | nan | |
|---|---|---|
| re_datetime | 568,438 | 0 |
| arr_datetime | 569,168 | 0 |
Thanks to the data shown before, it was possible to make an Exploratory Data Analysis.
See that station number 271 is the one with the highest number of records where users took a bicycle in the months of:
On the other hand, station 27 is the most frequented to take a bicycle in the months of April and July. Also this station is the majority where users returned a bicycle in the months of:
Now, several questions or issues arised about the data.
Where are stations 27 and 271 located, as well as stations 1 and 43?
A map of all the stations can be found on the official ecobici website, although in this link the corresponding file is found.
The location of each of the 480 stations can be found in the following page.
Or it can be obtained from the API, where an access token is needed, obtained from the credentials of each user, this access token has an expiration time of one hour. (put in browser https://pubsbapi.smartbike.com/oauth/v2/token?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&grant_type=client_credentials).
Before we use thed the API, it was needed to get the our own credentials in this page. After this, then we got a JSON file with the stations of Ecobici service:
ecobici_api <- function(path) {
url <- modify_url("https://pubsbapi.smartbike.com", path = path)
resp <- GET(
url,
add_headers(charset="UTF-8")
)
if (http_type(resp) != "application/json") {
stop("API did not return json", call. = FALSE)
}
parsed <- jsonlite::fromJSON(content(resp, "text", encoding = "UTF-8"), simplifyVector = FALSE)
if (http_error(resp)) {
stop(
sprintf(
"Ecobici API request failed [%s]\n%s\n<%s>",
status_code(resp),
parsed$message,
parsed$documentation_url
),
call. = FALSE
)
}
structure(
list(
content = parsed,
path = path,
response = resp
),
class = "ecobici_api"
)
}
print.ecobici_api <- function(x, ...) {
cat("<Ecobici ", x$path, ">\n", sep = "")
##Uncomment to see the whole JSON structure
##str(x$content)
invisible(x)
}
ecobici_api("/api/v1/stations.json?access_token=${your_access_token}")## <Ecobici /api/v1/stations.json?access_token=${your_access_token}>
Before each operation, it’s important to get an API_KEY from google console, therefore we had to create a project and then enable the Google Maps API.
first_station <- google_geocode(address = "Paseo de la Reforma y Havre, Juárez, 06600 Ciudad de México", key = key, simplify = TRUE)
coord <- first_station$results$geometry$location
name <- first_station$results$address_components[[1]]$long_name
google_map(key = key, data = coord, zoom = 1) %>%
add_markers(lat = "lat", lon = "lng", info_window = name)## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded
# Deprecated code of previous version
# first_station <- geocode('Paseo de la Reforma y Havre, Juárez, 06600 Ciudad de México',
# source = "google")
# map_first_station <- get_map(location = as.numeric(first_station),
# color = "color",
# maptype = "roadmap",
# scale = 2,
# zoom = 16)
# ggmap(map_first_station) + geom_point(aes(x = lon, y = lat),
# data = first_station , colour = 'green',second_station <- google_geocode(address = "Jesús García 271, Buenavista, 06350 Ciudad de México, CDMX, México", key = key, simplify = TRUE)
coord <- second_station$results$geometry$location
name <- second_station$results$address_components[[1]]$long_name
google_map(key = key, data = coord, zoom = 2) %>%
add_markers(lat = "lat", lon = "lng", info_window = name)## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded
# Deprecated code of previous version
# second_station <- geocode('Jesús García 271, Buenavista, 06350 Ciudad de México, CDMX, México',
# source = "google")
#
# map_second_station <- get_map(location = as.numeric(second_station),
# color = "color",
# maptype = "roadmap",
# scale = 2,
# zoom = 16)
#
# ggmap(map_second_station) + geom_point(aes(x = lon, y = lat),
# data = second_station , colour = 'green',
# shape = 20, size = 10, fill= "green")third_station <- google_geocode(address = 'Rio Sena y Rio balsas, Ciudad de México, CDMX, México', key = key, simplify = TRUE)
coord <- third_station$results$geometry$location
name <- third_station$results$address_components[[1]]$long_name
google_map(key = key, data = coord, zoom = 2) %>%
add_markers(lat = "lat", lon = "lng", info_window = name)## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded
# Deprecated code of previous version
# third_station <- geocode('Rio Sena y Rio balsas, Ciudad de México, CDMX, México',
# source = "google")
#
# map_third_station <- get_map(location = as.numeric(third_station),
# color = "color",
# maptype = "roadmap",
# scale = 2,
# zoom = 16)
#
# ggmap(map_third_station) + geom_point(aes(x = lon, y = lat),
# data = third_station , colour = 'green',
# shape = 20, size = 10, fill= "green")fourth_station <- google_geocode(address = 'Juarez y Revillagigedo, Ciudad de México, CDMX, México', key = key, simplify = TRUE)
coord <- fourth_station$results$geometry$location
name <- fourth_station$results$address_components[[1]]$long_name
google_map(key = key, data = coord, zoom = 2) %>%
add_markers(lat = "lat", lon = "lng", info_window = name)## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded
# Deprecated code of previous version
# fourth_station <- geocode('Juarez y Revillagigedo, Ciudad de México, CDMX, México',
# source = "google")
#
# map_fourth_station <- get_map(location = as.numeric(fourth_station),
# color = "color",
# maptype = "roadmap",
# scale = 2,
# zoom = 16)
#
# ggmap(map_fourth_station) + geom_point(aes(x = lon, y = lat),
# data = fourth_station , colour = 'green',
# shape = 20, size = 10, fill= "green")The station with more concurrence was the one that was located on “Reforma”, nearby “Reforma 222”, which is a mall and has work offices.
Two stations are located near of Buenavista subway station, where it is a point of connection between many people who come from the “Estado de Mexico” and other parts of Mexico City to move to the areas where the highest density of jobs is located.
Finally the station 43 (which at the time of writing this is out of operation) is very concurrent to have closeness to “Reforma” avenue and the downtown.
To have all the data collected in the months of January to July 2018, every data set was combined to obtain the average time that users use the ecobici service of all stations with registered usability.
The time of usability of each user was calculated:
historicos_ecobici <- historicos_ecobici %>%
mutate(duracion = as.duration(arr_datetime-re_datetime))
historicos_ecobiciAnd then this data arranged by “duracion”:
There are records where users took 0 and up to 1 second to return a bike at different stations, this is possibly due to two things mainly:
There are records where users took days, weeks and even up to 1.53 years to return a bicycle.
To avoid taking data where users had a mistake of did not return the bicycle, or they took longer than the regulation, a filter was used to take only data where there was a duration of 30 seconds from one station to another or those that comply with the established ecobici regulations duration of 45 minutes as maximum.
new_historicos_ecobici <- historicos_ecobici %>% filter(duracion <= dminutes(45))
new_historicos_ecobici <- new_historicos_ecobici %>% filter(dseconds(30) <= duracion)
new_historicos_ecobici %>% arrange(duracion)promedio_tiempo_uso <- mean(new_historicos_ecobici$duracion)
promedio_tiempo_uso <- minute(seconds_to_period(round(promedio_tiempo_uso)))
print(str_c("The average time of use of an ecobici within the regulatory time, in Mexico City is of: ", promedio_tiempo_uso, " minutes"))## [1] "The average time of use of an ecobici within the regulatory time, in Mexico City is of: 13 minutes"
With this new cleaning, new records were obtained between the main 4 stations (271, 27, 1 and 43).
new_historicos_ecobici %>%
filter(ciclo_estacion_retiro %in% c(1, 43, 27, 271) | ciclo_estacion_arribo %in% c(1, 43, 27, 271)) Continuing with some other data, the following graph shows the proportion on the use of the service ecobici by registered ages.
By convention, those data in which the age is greater than 85 years was be omitted.
new_historicos_ecobici %>%
filter(edad_usuario < 85) %>%
group_by(edad_usuario) %>%
dplyr::summarise(conteo = n()) %>%
mutate(proportion = conteo/sum(conteo)) %>%
ggplot(aes(x = edad_usuario, y = proportion, fill = proportion)) +
geom_bar(stat = "identity") +
labs(x = "User age", y = "Proportion") +
ggtitle("Proportion of use of the service over the age of the users")This will serve to know when a higher demand for service and greater availability of bicycles is required. The parameter re_datetime is used because it is the time at which users start using a bicycle, which should not vary much during the day witharr_datetime, because the time of use with the new tables is less than 45 minutes.
por_hora_del_dia <- new_historicos_ecobici %>%
mutate(hora = hour(re_datetime)) %>%
group_by(hora) %>%
dplyr::summarise(conteo = n()) %>%
ggplot(aes(x = hora, y = conteo, fill = conteo)) +
geom_bar(stat = "identity") +
ggtitle("Number of active users per hour who take a bicycle") +
labs(x = "Hour of day", y = "Number of users")
ggplotly(por_hora_del_dia)por_hora_del_dia <- new_historicos_ecobici %>%
mutate(hora = hour(arr_datetime)) %>%
group_by(hora) %>%
dplyr::summarise(conteo = n()) %>%
ggplot(aes(x = hora, y = conteo, fill = conteo)) +
geom_bar(stat = "identity")+
ggtitle("Number of active users per hour returning a bicycle") +
labs(x = "Hour of day", y = "Number of users")
ggplotly(por_hora_del_dia)To have the top 10 hours where there is greater demand for ecobici, users were counted by hour:
new_historicos_ecobici %>%
mutate(hora = hour(re_datetime)) %>%
group_by(hora) %>%
dplyr::summarise(conteo = n()) %>%
arrange(desc(conteo)) %>%
head(10)## # A tibble: 10 x 2
## hora conteo
## <int> <int>
## 1 18 483166
## 2 8 478318
## 3 19 391346
## 4 9 370395
## 5 17 353746
## 6 14 348941
## 7 15 347330
## 8 16 287176
## 9 13 286461
## 10 7 261877
It is curious that these hours coincide with the usual time to enter and leave to work in Mexico City, which makes sense, because many people go to work on these vehicles.
There’s one more analysis and it’s to know which days of the week have the highest demand for bicycles, due to the number of users who take any.
por_dia_semana <- new_historicos_ecobici %>%
mutate(dia_semana = format(as.Date(re_datetime),"%A")) %>%
group_by(dia_semana) %>%
dplyr::summarise(conteo = n()) %>%
arrange(desc(conteo))
por_dia_semana## # A tibble: 7 x 2
## dia_semana conteo
## <chr> <int>
## 1 Tuesday 900937
## 2 Wednesday 895546
## 3 Thursday 865172
## 4 Monday 822882
## 5 Friday 815889
## 6 Saturday 374238
## 7 Sunday 323518
dias_sem <- ggplot(por_dia_semana, aes(x=dia_semana, y=conteo, fill= dia_semana)) +
geom_bar(stat = "identity") +
ggtitle("Number of users per day of the week") +
labs(x = "Day of week", y = "Number of users")
ggplotly(dias_sem)It’s very clear that the activity decreases on weekends to less than half of usual and the busiest day is Tuesday.
gender <- ggplot(new_historicos_ecobici, aes(genero_usuario, fill = genero_usuario)) +
geom_bar() +
scale_x_discrete(drop = FALSE) +
ggtitle("Number of users by gender in the registers") +
labs(x = "User genre", y = "Number of users")
ggplotly(gender)The difference between the number of men who use this service against the female gender is remarkable.
# Before searching, an API Key must be created https://dev.twitter.com/apps
consumer_key = "${your_consumer_key}";
consumer_secret = "${your_consumer_secret}";
# Basic Auth configuration
secret <- jsonlite::base64_enc(paste(consumer_key, consumer_secret, sep = ":"))
req <- httr::POST("https://api.twitter.com/oauth2/token",
httr::add_headers(
"Authorization" = paste("Basic", gsub("\n", "", secret)),
"Content-Type" = "application/x-www-form-urlencoded;charset=UTF-8"
),
body = "grant_type=client_credentials"
);
# Extraction of access token
httr::stop_for_status(req, "authenticate with twitter")
token <- paste("Bearer", httr::content(req)$access_token)
# Call to the API
url <- "https://api.twitter.com/1.1/search/tweets.json?q=ecobici&result_type=mixed"
req <- httr::GET(url, httr::add_headers(Authorization = token))
json <- httr::content(req, as = "text")
tweets <- jsonlite::fromJSON(json)
substring(tweets$statuses$text, 1, 100)## [1] "Es increíble que con @ecobici:\n\u274cEl @GobCDMX pague 180 millones anuales a Clear Channel por el servic"
## [2] "Te van a criticar por todo.\n\nTú sigue pasando tu tarjeta por el lector al finalizar tus viajes aunqu"
## [3] "Les presentamos la nueva FORD EXPLORER BOLARDO la SUV de mal gusto, pero buena reversa! @BJAlcaldia "
## [4] "RT @AngeliqueMera: Es mi hermano. \n\nTrabaja en @ecobici y desde ayer no llegó a trabajar. \n\n#TeBusc"
## [5] "RT @Reporte_Indigo: Mientras más personas se convierten en usuarios de las Ecobicis, el 30% de las b"
## [6] "RT @AngeliqueMera: \U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\n\nMi hermano trabaja en @ecobici ayer salió de casa y no llegó a traba"
## [7] "@AngyBeckham Hola Angie, lamentamos los inconvenientes, nuestro equipo revisará lo ocurrido con la A"
## [8] "@ecobici qué onda con su app, en el mapa indica bicis disponibles y en las 4 que ya recorrí no hay n"
## [9] "\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\n\nMi hermano salió ayer rumbo a su trabajo en @ecobici y no llegó, no sabemos nada de él.… ht"
## [10] "RT @XochitlGalvez: Es increíble que con @ecobici:\n\u274cEl @GobCDMX pague 180 millones anuales a Clear Ch"
## [11] "RT @AngeliqueMera: \U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\n\nMi hermano trabaja en @ecobici ayer salió de casa y no llegó a traba"
## [12] "RT @AngeliqueMera: \U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\n\nMi hermano trabaja en @ecobici ayer salió de casa y no llegó a traba"
## [13] "@LuisLobato Hola, trabajaremos con las instancias correspondientes para atender el reporte"
## [14] "@ecobici la estación 371 Tlacoquemecatl, bloqueada por dos autos desde hace más de 25 minutos. https"
## [15] "RT @AngeliqueMera: \U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\U0001f6a8\n\nMi hermano trabaja en @ecobici ayer salió de casa y no llegó a traba"